Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning

نویسندگان

  • Petar Kormushev
  • Kohei Nomoto
  • Fangyan Dong
  • Kaoru Hirota
چکیده

A mechanism called Eligibility Propagation is proposed to speed up the Time Hopping technique used for faster Reinforcement Learning in simulations. Eligibility Propagation provides for Time Hopping similar abilities to what eligibility traces provide for conventional Reinforcement Learning. It propagates values from one state to all of its temporal predecessors using a state transitions graph. Experiments on a simulated biped crawling robot confirm that Eligibility Propagation accelerates the learning process more than 3 times.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Eligibility Propagation to Speed up Time Hopping for RL Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning

A mechanism called Eligibility Propagation is proposed to speed up the Time Hopping technique used for faster Reinforcement Learning in simulations. Eligibility Propagation provides for Time Hopping similar abilities to what eligibility traces provide for conventional Reinforcement Learning. It propagates values from one state to all of its temporal predecessors using a state transitions graph....

متن کامل

Time Hopping Technique for Reinforcement Learning and its Application to Robot Control

To speed up the convergence of reinforcement learning (RL) algorithms by more efficient use of computer simulations, three algorithmic techniques are proposed: Time Manipulation, Time Hopping, and Eligibility Propagation. They are evaluated on various robot control tasks. The proposed Time Manipulation [1] is a concept of manipulating the time inside a simulation and using it as a tool to speed...

متن کامل

Time Hopping technique for faster reinforcement learning in simulations

A technique called Time Hopping is proposed for speeding up reinforcement learning algorithms. It is applicable to continuous optimization problems running in computer simulations. Making shortcuts in time by hopping between distant states combined with off-policy reinforcement learning allows the technique to maintain higher learning rate. Experiments on a simulated biped crawling robot confir...

متن کامل

Macro - Actions in Reinforcement Learning : An EmpiricalAnalysisAmy McGovern and Richard

Several researchers have proposed reinforcement learning methods that obtain advantages in learning by using temporally extended actions, or macro-actions, but none has carefully analyzed what these advantages are. In this paper, we separate and analyze two advantages of using macro-actions in reinforcement learning: the eeect on exploratory behavior, independent of learning, and the eeect on t...

متن کامل

Investigating Recurrence and Eligibility Traces in Deep Q-Networks

Eligibility traces in reinforcement learning are used as a bias-variance trade-off and can often speed up training time by propagating knowledge back over time-steps in a single update. We investigate the use of eligibility traces in combination with recurrent networks in the Atari domain. We illustrate the benefits of both recurrent nets and eligibility traces in some Atari games, and highligh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JACIII

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2009